Turbo Pascal Version 4.0 Observations 12/06/87 Introduction and Comments: 2 3.x compared with 4.0: 3 Value assignment speed tests: 5 Incrementing / decrementing integers: 6 Control variable (counters) speed tests: 6 Loop structure speed tests: 7 Procedure and Function speed tests: 7 String manipulation speed tests: 8 Screen I/O speed tests: 9 Simple / useful routines: 10 Introduction and Comments: When I received the upgrade to 4.0 for Turbo Pascal I was mildly excited. But as I started going through the new manual, experimenting and trying out some of the new features, I felt like a kid with a new toy. Turbo Pascal 4.0 is nothing short of awesome! I love it! Borland has certainly outdone themselves with this release. It's faster, it's more powerful, it's easier to use, and a big surprise is that it can generate a smaller .EXE file than an equivalent .COM file in version 3.x. In the README documentation on the compiler distribution diskette it is mentioned that the system unit traps interrupt 24 (DOS Critical Error) automatically and allows error checking via the usual IOResult with {$I-}. That sounded remarkable! It would sure simplify a number of things, but for some reason the old "KeyPressed" function came to mind. So the very first thing tried was to get the free space on a floppy diskette in an empty drive (i.e. no diskette present, door open) ... it worked like a champ. No ugly DOS "Abort, Retry, Ignore" message, no runtime error, just an error code in IOResult. Same thing with an unformatted diskette. Very impressive! By the way, the new "KeyPressed" function works great too! Borland has done the programming community a tremendous favor with this release. It is superb. There are two minor things I wouldn't mind seeing changed in the next update. The Ctrl-F1 syntax help feature is tremendous, it will really save wear and tear on the manuals. However, it would be nice to see the ability to add new information to these help windows, to cover your own custom routines. Well, maybe you can and I just haven't figured out how yet. The other thing is in the editor's Find-And-Replace defaults. I rarely use the same search strings twice in a row, but saving the last search string is kind of a neat idea. However, positioning the cursor at the end of the string is rather confusing at first. Sure, if you press any character key the field is cleared and ready for a brand new search string, but it would be less confusing if the cursor were placed at the beginning of the string. Pressing would still retain the default, and it would be a little more intuitive. But these are just dippy little complaints and perhaps not even worth mentioning here. At this point there is no doubt that release 4.0 will simplify and shorten the programming process, in addition to generating faster and more compact code. It's what a programming language should be! In order to compare various operations for speed, a looping structure was set up which allowed the item tested to be executed between .5 and 34 million times, depending on the operation, with 5 repetitions of the loop to average the results. Page 2 For timing purposes, it's easy to use the PC's master clock count stored in the ROM communication area of low memory (in the 0 block) at offset $046C. This 4 byte (2 word) number is used to record each clock tick (occurring about 18.2 times per second) from midnight to midnight. A LongInt really isn't necessary because we don't need the whole number for these relatively short and simple tests. For more information refer to chapter 3 of Peter Norton's "Programmers Guide to the IBM PC." This is basically the same technique used in Brian Foley's QKSTR.PAS demonstration. For these simple timing tests an absolute variable was declared as follows: Var MasterClock : Word ABSOLUTE $0000:$046C; While $0040:$006C is also correct, the other style helps you remember that it is in the zero block of memory. These tests were run on a Kaypro PC-10 (NEC V20 processor) running at 8 MHz. Execution times may vary depending on your machine. Turbo Pascal 3.1 vs. 4.0: First the empty loop structure was timed, and then compared with the extra time it took to run with the test code inserted into the loop. Here are some interesting statistics which turned up. Version 3.1 .COM file size 11,575 bytes Version 4.0 .EXE file size 5,264 bytes So version 4.0 was about 65% smaller. The 3.1 .COM file was optimized using Turbo Optimizer version 1.02 by Turbo Power Software (TOPT defaults, TLC compact mode). The optimized .COM file size was 2,052 bytes, or about 82% smaller than the 3.1 version, and about 61% smaller that the 4.0 version. This was not too surprising. As for speed differences, the results were as follows: Version 4.0 was about 8% faster than version 3.1 in the empty loop timing tests. The optimized version of 3.1 was about 11% faster than the version 4.0 code. Release 4.0 of Turbo Pascal will not make Turbo Optimizer obsolete. Perhaps a new version of the Optimizer will soon be released. The improvements made by Borland may eliminate the need for TLC (Turbo Library Compactor), but surely not for TOPT (Turbo OPTimizer). Page 3 The loop structure used is not very fancy or efficient, but it does give a basic idea of the speed of various operations. Obviously you could produce a much more accurate test, but this was used for now: Program BenchMarkTests; Uses CRT; Type MaxString = String; { same as string[255] } Var MasterClock : Word ABSOLUTE $0000:$046C; StartTime, StopTime : Word; OuterLoop, InnerLoop, A : Word; Begin For OuterLoop := 1 to 5 do Begin { any necessary assignments here } StartTime := MasterClock; For InnerLoop := 1 to 10000 do { 10000 varies } Begin A := 0; Repeat Inc(A); { test code goes here } Until A = 255; End; { of Inner Loop } StopTime := MasterClock; Writeln(StopTime - StartTime,' Master Clock Ticks.'); End; { of Outer Loop } Sound(1000); While not KeyPressed do; { nothing } NoSound; End. After determining how long the empty loop took to run, that amount was subtracted from the final clock tick count. The result was that 0 master clock ticks were indicated when the loop was run (empty), give or take 1 clock tick. Page 4 Value Assignment speed tests: One set of tests involved assigning a value to a variable. In this case different types of variables were used but the value was always 100. For instance: Num := 100; Also, 100 was assigned to a variable X of the same type as Num, and was timed as follows: Num := X; Both methods stored a value of 100 into num. Here are the results: Variable | Clock ticks during 2,560,000 repetitions type | Num := 100 | Num := X _________|________________|_______________ ShortInt | 159 | 200 Byte | 159 | 200 Integer | 200 | 250 Word | 200 | 250 LongInt | 405 | 544 So assignments work faster on byte size quantities, which was surprising since the 8086 family normally works with Word size quantities. LongInts take about twice as long as Integers, as expected. So if you are in a hurry, don't use LongInts where Integers or Words will suffice. How about value assignment between different types of integers? After executing each assignment about 2.5 million times, the following was learned (keep in mind that these values are approximate and rounded off): Integer Type Transfer | Master clock ticks recorded _________________________|_______________________________ Shortint to Byte | 200 (232,960 per second) Byte to ShortInt | 200 (232,960 per second) Integer to Word | 249 (187,116.5 per second) Word to Integer | 249 (187,116.5 per second) ShortInt to Integer | 249 (187,116.5 per second) Byte to Integer | 276 (168,811.6 per second) ShortInt to LongInt | 429 (108,606.1 per second) Byte to LongInt | 479 ( 97,269.3 per second) Integer to LongInt | 429 (108,606.1 per second) Word to LongInt | 455 (102,400 per second) LongInt to LongInt | 543 ( 85,804.8 per second) Page 5 Incrementing / Decrementing Numbers: As for incrementing and decrementing numbers, the following information was accumulated after 64 million repetitions of each method: | Clock Ticks | | during | Method | 64 M Reps | Times per second (about) __________|_____________|____________________________ X + 1 | 1,401 | 831,406 Succ(X) | 1,275 | 913,568.6 Inc(X) | 883 | 1,319,139.3 | | X - 1 | 1,404 | 829,629.6 Pred(X) | 1,275 | 913,568.6 Dec(X) | 883 | 1,319,139.3 Percentage faster than "X + 1:" Succ(X) = 9% Inc(X) = 37% Percentage faster than "X - 1:" Pred(X) = 9% Dec(X) = 37% Control Variable (Counters) speed tests: In using various integer types as control variables (counters) in loops, the following results were noted: After about 25 million repetitions, there was no significant difference in speed between Bytes and ShortInts. After about 170 million repetitions it was noted that Words (unsigned) perform about 4% faster than Integers (Signed). When used as counters in a for loop: Byte values count the fastest (about 19% faster than Integers). Words count about 4% faster than Integers. Page 6 Loop Structure speed tests: In comparing the various looping structures the FOR loop was the slowest because of the counter variable. An interesting item came to light in comparing the following loops (after more than 25 million repetitions): { X := 0 to start } While X < 255 do Inc(X); Repeat Inc(X); Until X = 255; The REPEAT ... UNTIL loop ran about 9% faster than the WHILE loops. Procedure and Function speed tests: In comparing Procedures and Functions the following were used: Procedure Test; Begin A := Succ(A); End; (Where A is a global variable and the command timed was "Test;") Procedure Test( Var A : ); Begin A := Succ(A); End; (Where A is passed as a variable (by reference) and the command timed was "Test(A);") Function Test( A : ) : ; Begin Test := Succ(A); End; (Where the command timed was "A := Test(A);") The results were much as you would expect. In all cases the procedure which altered a global variable performed the fastest. | Clock ticks during .25 M reps Method | Byte | Word | Integer | LongInt ________________|________|________|_________|__________ Procedure Test | 189 | 196 | 197 | 263 Proc Test(Var A | 257 | 265 | 265 | 328 Function Test(A | 246 | 263 | 282 | 372 Page 7 For Procedures altering global variables, Bytes are fastest, LongInts are slowest. For Procedures altering variables passed as a variable parameter, again Bytes are fastest, LongInts are slowest. For Functions Bytes are also fastest, LongInts are slowest. In most cases a function is faster than a procedure which uses a variable parameter. The only real surprise is that when altering variables of type integer, a procedure using a variable parameter is faster than a function. String Manipulation speed tests: Page 121 of the 4.0 manual recommends using string library routines such as LENGTH and COPY rather than directly accessing the internal string structure. Example: Type Str : String[255]; Var Name : Str; LengthByte : Byte ABSOLUTE Name; In the above circumstances the following is true: Length(Name) = Ord(Name[0]) = LengthByte (All are equal) So comparison tests were run using techniques mentioned in the Turbo Optimizer Manual, the November 1987 PC Magazine, and Brian Foley's (76317,3247) quick string manipulation demonstration (QKSTR.PAS). The results were VERY interesting! Averages were determined after executing each item about 2.25 million times. Page 8 String Length Determination: TestString := 'ABCDEFGHIJKLMNOPQRSTUVWXYZ.abcdef...'; Str1 := Length(TestString); Str1 := Ord(TestString[0]); Str1 := LengthByte; After over 2 million repetitions there was NO significant difference in speed between any of the above techniques! Borland has greatly improved the speed of Turbo's Length Function! Delete last character from string: Delete(TestString,LengthByte,1); TestString[0] := Pred(TestString[0]); Dec(TestString[0]); LengthByte := Pred(LengthByte); Dec(LengthByte); DEC(TestString[0]) and DEC(LengthByte) were the fastest, operating at about the same speed. This method is over 3,000% faster than using the DELETE procedure. TestString[0] := PRED(TestString[0]) and LengthByte := PRED(LengthByte) operated at about the same speed and were over 2,600% faster than Turbo's DELETE procedure. Delete first character from string: Delete(TestString,1,1); Move(TestString[2],TestString[1],Pred(LengthByte); Dec(LengthByte); The latter performed about 4 1/2 times faster than the former. Screen I/O speed tests: As for screen I/O, 4.0 incorporates tremendous improvement in speed over 3.x. The manual mentions a predefined variable "DirectVideo" which determines whether information is placed directly into video memory, or sent through the BIOS. This variable is only effective when your program includes the CRT unit in a "uses" clause (i.e. 'Uses CRT;'). When direct video memory writing is used, write and writeln operate about 3 times faster than using BIOS. Quite an improvement! But FastWriteNA (one of the routines listed in "FASTWR.PAS - Version 2.1) performs about 67 times faster than Turbo even when Turbo is using direct video memory writing. Page 9 Conclusion: These certainly aren't the most accurate or most exhaustive benchmark tests that could be performed, but they should give you an idea of how version 4.0 compares with version 3.x. Some of the tricks used to increase speed in version 3.x are no longer necessary, while others are still useful. I would sure appreciate hearing about the things other people are finding. If you come up with more accurate info, better speed up techniques, or whatever, please place your info online for the rest of us. You could also drop me a note: Bob Falk (El Paso, Texas), CompuServe ID# 71420,2431. Page 10 As an afterthought the following routines were included, they may be useful. These routines are very easy use. They resemble similar routines with the same names in Clipper, dBase, FoxBase, QuickSilver, etc. They are heavily commented (too much probably), so you may want to remove all or most of the comments if their size is not to your liking. You can do whatever you want with them. They are: Procedure Upper(Var Strng : MaxString); Converts all lower-case characters in a string to uppercase. Procedure Lower(Var Strng : MaxString); Converts all uppercase characters in a string to lower-case. Procedure Wait; Clears the keyboard buffer and waits for a keystroke before returning control to the program. Both UPPER and LOWER are very fast! Explanation: Page 362 of the Turbo Manual (4.0) gives an example of an external procedure to convert all characters in a string to uppercase, but I'm not too swift with assembler so the simple InLine approach seemed better here. According to chapter 26 of the Turbo Pascal 4.0 manual, parameters are passed to procedures and functions via the stack. In the case of the functions UPPER and LOWER we only pass one parameter and in both cases it is a variable parameter (Var Strng: MaxString), therefore is is passed by reference. This means that what is placed on the stack is not the string itself, but a pointer to where the actual string is stored. By the way, all string type parameters are now passed this way (as a pointer, page 358 Turbo Manual). Fortunately the manual for version 4.0 provides a great deal of information on how all this stuff is passed. On page 358 we are told that a pointer type parameter is passed as a double word ($:$). The segment is pushed first, and the offset is pushed next. On page 366 of the manual we learn that "the value of a variable identifier in an inline element is the offset address of the variable within its base segment." Also we note that the variable offset is relative to the BP register. Now, in Jeff Duntemann's superb book "Turbo Pascal Solutions," he explains not only how all this works, but also how to use that information in Page 11 actual practice. He even includes the "Eyeball Inline Assembler" to help you figure out which opcode refers to what mnemonic. Remarkably, even though we are using a VERY different (much improved) version of Turbo Pascal, Mr. Duntemann's book is by no means obsolete. Such a useful book as "Turbo Pascal Solutions" should be on every Turbo Pascal programmer's bookshelf. Anyway, appendix A of "Turbo Pascal Solutions" includes the opcodes used to address pointer references (var parameters). He uses ES:[DI] by placing the offset to the parameter (the last value pushed onto the stack) into DI and the segment (the first value pushed onto the stack) into ES. He even explains how to get this 32-bit address off the stack and into ES and DI, which is a wonderful thing because I know beans about Assembler. Beginning on page 90 of "Turbo Pascal Solutions" is an excellent explanation of how all this stuff works. The instruction LES (Load pointer using ES) takes the 32 bit address (the pointer to our string location) off of the stack and puts the segment portion into ES (it doesn't really take of OFF the stack, it just gets it from the stack), and the offset portion into the register that you specify. For example, LES DI,STRNG[BP] means take the segment portion of parameter STRNG and put it in ES, and take the offset portion of STRNG and put it in DI (remember that the variable offset is relative to BP). Now we can access our string and since in this procedure we are going to change the string - rather than return a new modified string as we would if this was a function - we don't have to mess with the methods used to return a different pointer on the stack. If you don't like the names of these procedures you may change them. I like them because they are similar to the UPPER and LOWER functions in Clipper which I've used a lot. Also, in my own use I don't declare STRNG as type MaxString, but rather as "String232". The declaration section of the program looks like this: Type String232 = String[232]; Why? Because wide carriage printer, in compressed type face (17 pitch) can still only fit 232 characters on a line. So I rarely use strings larger than 232 characters. So why not just save 23 bytes here and there? ------------------------------------------------------------------ Page 12 Procedure Upper( Var Strng : MaxString ); Begin Inline ($C4/$BE/STRNG/ { LES DI,STRNG[BP] ; get string } { address from stack } $31/$C9/ { XOR CX,CX ; clear CX to zero } $26/$8A/$0D/ { MOV CL,ES:[DI} ; place 1st byte of} { string (the length byte) into CL } $E3/$13/ { JCXZ +13h (19 bytes) ; if CX is 0 } { (null string) then jump +19 bytes} $47/ {CNV: INC DI ; increment pointer to } { point to next character in STRNG } $26/$80/$3D/$7A/ { CMP BYTE PTR ES:[DI],122 ; compare } { character with a lower-case 'z' } $77/$0A/ { JA +Ah (10 bytes) ; if char is >'z'} { skip it and move on (it's extnded} $26/$80/$3D/$61/ { CMP BYTE PTR ES:[DI],97 ; compare } { character with lower-case 'a' } $72/$04/ { JB +4 (4 bytes) ; if char is < 'a' } { it's not lower-case so move on } $26/$80/$2D/$20 { SUB BYTE PTR ES:[DI],32 ; subtract } { 32 (hex 20) from char (convert) } $E2/$ED); {MORE: LOOP -13h (19 bytes) ; Loop back to} { CNV and check next character } {DONE: ; That's it, we are finished } End; { end of procedure UPPER } Procedure Lower( Var Strng : MaxString ); Begin Inline ($C4/$BE/STRNG/ { LES DI,STRNG[BP] ; get string } { address from stack } $31/$C9/ { XOR CX,CX ; clear CX to zero } $26/$8A/$0D/ { MOV CL,ES:[DI} ; place 1st byte of} { string (the length byte) into CL } $E3/$13/ { JCXZ +13h (19 bytes) ; if CX is 0 } { (null string) then jump +19 bytes} $47/ {CNV: INC DI ; increment pointer to } { point to next character in STRNG } $26/$80/$3D/$5A/ { CMP BYTE PTR ES:[DI],90 ; compare } { character with an uppercase 'Z' } $77/$0A/ { JA +Ah (10 bytes) ; if char is >'Z'} { it's not uppercase (move on) } $26/$80/$3D/$41/ { CMP BYTE PTR ES:[DI],65 ; compare } { character with an uppercase 'A' } $72/$04/ { JB +4 (4 bytes) ; if char is < 'A' } { it's a control char so move on } $26/$80/$05/$20 { ADD BYTE PTR ES:[DI],32 ; add 32 } { (hex 20) to character (convert) } $E2/$ED); {MORE: LOOP -13h (19 bytes) ; Loop back to} { CNV and check next character } {DONE: ; That's it, we are finished } End; { end of procedure UPPER } (Note: $ED is two's compliment for -19.) Page 13 Procedure Wait; { Wait uses DOS Interrupt 12 ($C). It's purpose is to wait for a keystroke before returning control to the program, it doesn't return anything. It clears the keyboard buffer (in RAM) to prevent accidental continuation due to characters stored in the keyboard buffer. It was used because of the quirks in the Turbo Pascal KeyPressed function. I still use it because it is like the WAIT procedure in Clipper and dBase, and it's easier to use than "While not KeyPressed do;". DOS Interrupt $C is the "Clear Keyboard and Do" function. It clears the keyboard buffer and then executes one of 5 available DOS services (1,6,7,8 or A). The service is determined by placing it's number into the AL register (remember AH contains $C). In this case we will choose service 7 - Direct Keyboard Input (without echo). The result will be that the keyboard buffer will be cleared, and the procedure will wait for a keystroke before returning control to the main program. } Begin Inline ($B4/$0C/ { MOV AH,0Ch ; Place $C in AH. DOS Clear Key-} { board and Do service } $B0/$07/ { MOV AL,7 ; Place 7 in AL. DOS Direct } { Console Input (without echo) } $CD/$21); { INT 21h ; Call Interrupt 33, DOS Func- } { tion request } End; { end of Procedure Wait } NOTE: I didn't want to upload 2 files in the .ARC file, so you'll have to remove these procedures from this file with your word processor, and then probably clean up the comments. They're there so someone not too familiar with Assembler could understand them. Page 14